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Abstract 

Let P be a probability distribution on g-dimensional space. The so-called Diaconis-Freed- 
man effect means that for a fixed dimension d « q, most d-dimensional projections of P 
look like a scale mixture of spherically symmetric Gaussian distributions. The present pa- 
per provides necessary and sufficient conditions for this phenomenon in a suitable asymp- 
totic framework with increasin g dimension q. It turns out that the conditions formulated by 
Diaconis and FreedmanI (119841) are not only sufficient but necessary as well. Moreover, letting 
P be the empirical distribution of n independent random vectors with distribution P, we inves- 
tigate the behavior of the empirical process ^/n{P — P) under random projections, conditional 
on P. 



1 Introduction 



% 



A standard method of exploring high-dimensional datasets is to examine various low- 
dimensional projections thereof. In fact, r nany statistica l procedures are based explicitly 



Hubei 



1985b. 



Diaconis and Freedman 



19841) 



or implicitly on a "projection pursuit", cf. 
showed that under weak regularity conditions on a distribution P = P^''^ on W, "most" 
d-dimensional orthonormal projections of P are similar (in the weak topology) to a mix- 
ture of centered, spherically symmetric Gaussian distribution on W^' if q tends to infinity 
w hile d is fixed. A graphical demonstration of this disconcerting phenomenon is given 
by iBuja et all (|l996l) . Precise quantitative analyses are provided by Meckes (2009, 20 11) 
for situations where most projections are approximately Gaussian. The present paper 
provides further insight into the general phenomenon. We extend Diaconis and Freed- 
man's (1984) results in two directions. 
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Section [2] gives necessary and sufficient conditions on the sequence {P^'^^)q>d such 
that "most" rf-dimensional projections of P are similar to some distribution Q on R'^. It 



turns out that these conditions are essentially the conditions of 



Diaconis and Freedman 



(|l984h . The novelty here is necessity. The limit distribution Q is automatically a mixture 



of cente red, spherica 



anses m 



Eaton 



ly symmetric Gaussian distributions. The family of such measures 
(|198lh in a somewhat different context. 
More precisely, let T = T^'^^ be uniformly distributed on the set of column-wise or- 
thonormal matrices in M'^^'^ (cf. Section |4!2|) . Defining 

7^P := CxMl^X) 

for 7 G M.^^'^, we investigate under what conditions the random distribution T^ P con- 
verges weakly in probability to an arbitrary fixed distribution Q as q — )► oo, while d is 
fixed. 

In Section[3]we study the relationship between P = P^'^'> and the empirical distribution 
p = pCe.") of n independent random vectors with distribution P, also independent from 
the projection matrix F = F*^^). Suppose that the distributions P^'^^ satisfy the conditions 
of Section [2l Then the random distributions P^^") satisfy these conditions, too, as q and 
n tend to infinity. Furthermore, the standardized empirical measure n^/^ (T^ P — F^P) 
satisfies a conditional Central Limit Theorem given the data P. 

Proofs are deferred to Section |4l The rn ain ingredients are Poincare's (1912) Lemma 



and a method invented by iHoeffdingI (|l952r) in order to prove weak convergence of con- 



ditional distributions. Further we utilize standard results from weak convergence and 
empirical process theory. 

2 The Diaconis-Freedman Effect 

Let us first settle some terminology. A random distribution Q on a separable metric space 
(M, p) is a mapping from some probability space into the set of Borel probability mea- 
sures on M such that J f dQ is measurable for any function / G Cb(M), the space of 
bounded, continuous functions on M. We say that a sequence {Qk)k of random distri- 
butions on M converges weakly in probability to some fixed distribution Q if for each 



fee, 

f dQk ->p / f dQ as A; — 7- oo. 



In symbols, Qk — )-«,,p Q as A; — )■ oo. Standard approximation arguments (e.g. as in van 
der Vaart and Wellner, 1996, Section 1.12) show that {Qk)k converges in probability to Q 
if, and only if, 

^BL(Qfc,Q):= sup I fdQk-l fdQ -^^ (/c -> oo), 
/eJ^BL J J 

where J^bl stands for the class of functions / : M — )■ [—1, 1] such that \f{x) — f{y)\ < 
p{x, y) for all x, y G M. 

Now we can state the first result. Here and throughout, || ■ || denotes Euclidean norm 
and Md^v stands for the Gaussian distribution on W'- with mean vector and covariance 
matrix vid- 

Theorem 2.1 The following two assertions on the sequence {P^'^^)q>d stre equivalent: 

(Al) There exists a probability measure Q on W^ such that 

T^ P -^w,p Q as g — 7- oo. 

(A2) If X = X^'^\ X = X^''^ are independent random vectors with distribution P, then 

L{\\X\\^/q) ->^ R and X^Xjq ^p as g -^ oo 

for some probability measure R on [0, oo). 

The limit distribution Q in (Al) is a normal mixture, precisely, 

Q = I Md,vR{dv) 

with the limiting distribution R in (A2). 

Corollary 2.2 The random probability measure T^ P converges weakly in probability to 
the standard Gaussian distribution A^ i if, and only if, the following condition is satisfied: 

(B) For independent random vectors X = X''''\X = X^'^^ with distribution P, 

ll-^lP/? -^p 1 and X^X/q ->p as g -^ cx). D 



The implication "(A2) =^ (Al)" in Theorem 12.11 as well as sufficiency of condi- 
tion (B) in Corollary 12.21 are due to Diaconis and Freedman (1984, Theorem 1.1 and 
Proposition 4.2). They considered only (deterministic) empirical distributions P, but the 
extension to arbitrary distributions P is straightforward; see also Section [3l 

It should be pointed out here that neither Theorem 12.11 nor Corollary 12.21 are just a 
consequence of Poincare's (1912) Lemma, although the latter is somehow at the heart of 
the proof. Poincare showed that if Ug = {Uq^iY^^i is uniformly distributed on the unit 
sphere in W^, then the Lebesgue density of q^^'^Uq^i converges uniformly to the standard 
Gaussian density on M. Translated into the present setting, one can show that for a fixed 
vector X = x^'^^ G W \ {0}, the Lebesgue density of the random vector T^x converges 
uniformly to the Lebesgue density ofM^y as g — t- oo and ||a;p/g — )■ v > 0. 

Example 2.3 Condition (A2) is not a very restrictive requirement. For instance, suppose 
that X = U{fik + crfc^fc)fc=ij where {Zk)k>i is a sequence of independent, identically 
distributed random variables with mean zero and variance one, while U = U^''^ is an 
orthogonal matrix in W'^ and fi = fi^'^^ e M^, a = a^'?) G [0, oo)^. Then condition (A2) 
is satisfied if, and only if, 

(A3) llAir/9 -^ 0, ||o-|p/g -)■ f > and max al/q -)■ 

l<k<q 

as g — )■ oo; see Section |4l Here R = 6y and Q = Afd,v 

Example 2.4 Suppose that X ~ P^'^'> has independent, identically distributed compo- 



nents such that 



where 



P(X, = ^) = 1 - P(X, = 0) = 7r„ 



lim qiig = A > 0. 



Then C{\\Xf/q) = Bin(g, vr,) ^^ Poiss(A) and C{X^X/q) = Bin(g,7r2) ^^ 5o as 
q —^ oo. Hence (A2) is satisfied with R = Poiss(A). 

3 Empirical Distributions 

From P to P. If the distributions P = P^'>^ satisfy conditions (A 1-2), then the empirical 
distributions P = P^"?'") satisfy these conditions with high probability as min(g, n) — > oo. 



Precisely, one can easily deduce from condition (A2) that 
and 



1 

-J]min{|X7X^./g|,l} ->^ 

as min(g, ra) — )■ oo. Thus Theorem 12.11 implies that 

as both q and n tend to infinity, where the random projector T and the empirical distribu- 
tion P are assumed to be stochastically independent. 

Comparing P and P, part 1. In some sense Theorem 12. II is a negative, though math- 
ematically elegant result. It warns us against hasty conclusions about high-dimensional 
data sets after examining a couple of low-dimensional projections. In particular, one 
should not believe in multivariate normality only because several projections of the data 
"look normal". On the other hand, even small differences between different low-dimens- 
ional projections of P may be intriguing. Therefore we study the relationship between 
projections of the empirical distribution P and corresponding projections of P in more 
detail. 

In particular, we are interested in the halfspace norm 

||r^P-r^P||Ks := sup \T^P{H)-T^P{H)\ 

closed halfspaces hcR''- 

of T^ P — T^ P. In case of rf = 1 this is the usual Kolmogorov-Smirnov norm of 
T^ P — T^ P. In what follows we use several well-known results from empirical process 
theory. Instead of citing origin al p apers in various places we simply refer to the excellent 



monographs of 



Pollard (1 19841) and 



van der Vaart and Wellneii (|1996|) . It is known that 



(1) IE sup II7TP-7TPIIKS < Cv^ 



n 



for some universal constant C. For the latter supremum is just the halfspace norm of 
P — P, and generally the set of closed halfspaces in M'^ is a Vapnik-Cervonenkis class 



with Vapnik-Cervonenkis index k+l. Inequality ^ does not capture the typical deviation 
between ci-dimensional projections of P and P. In fact, 



sup E||7^P-7^P||ks < C^Jfa, 

7gR9Xd 



which implies that 



(2) ]E||r'^p-r"^p||Ks < c^Jfk. 

Our next result implies the limiting distribution of -\/?T||r^P — F^PJIks under con- 
ditions (Al-2). More generally, let H be a class of measurable functions from W^ into 
[—1, 1] . Any finite signed measure M on M'^ defines an element h i— ;• M{K) := J h dM of 
the space £oo(^) of all bounded functions on Ti equipped with supremum norm \\z\\^ := 
sup^jg^ \z{h)\. We shall impose the following three conditions on the class V. and the 
distribution Q = J J^d,v R{dv): 

(CI) There exists a countable subset l-Lo of 1-i auch that each h E Ti can be represented 

as pointwise limit of some sequence in "Ho- 

(C2) The set "H satisfies the uniform entropy condition 



/ ^y\ogN{u/H) du < oo. 
Jo 



Here N(u, H) is the supremum of N(u, T-L, Q) over all probability measures Q on M'^, and 
N(u, H, Q) is the smallest number m such that Ti can be covered with m balls having 
radius u with respect to the pseudodistance 



h) := ^Qi{g-h)'). 
(C3) For any sequence {Qk)k of probability measures converging weakly to Q, 

WQk-QWu ^0 as A; -> cx). 
Condition (CI) ensures that random elements such as ||F^P— F^P||^ are measurable. 



An example for conditions (CI -2) is the set Ti of (indicat ors of) closed halfspace s in 



ad 



Then condition (C3) is a consequence of general results by lBillingsley and Topsod l 
provided that Qi{0}) = 0, i.e. P({0}) = 0. 



19m, 



A particular consequence of (C2) is existence of a centered Gaussian process Bq, a 
so-called Q-bridge, having uniformly continuous sample paths with respect to pq and 
covariances 

]E{BQig)BQih)) = Qigh)-Qig)Qih), 

which can be proved via a Chaining argument. 

Theorem 3.1 Suppose that the sequence {P^'^^)q>d satisfies conditions (Al-2) ofTheo- 



rem \2.1[ and suppose that 1-L fulfills conditions (CI -3). Then 

5(^'") := (n^/^{T^P-T^P){h)) 
converges in distribution in foo('H) to Bq as min(g, n) — )■ oo. 

Comparing P and P, part 2. Theorem 13 . 1 1 takes into account the randomness in both 
the data (i.e. P) and the projection matrix T. However, exploratory projection pursuit 
means considering several projections of one data set. Thus we consider independent 
copies r^ = T^/ , i > 1, of r which are also independent from P. With these projection 
matrices we define 

5^) := (n^/'{TjP-TjP){h)) 

and study the distribution of 

for A := {1, . . . , L} with an arbitrary fixed integer L > 1. 

Subsequently a particular decomposition of the Q-Brigde Bq will be used: 

Bq = Bq + Bq 
with stochastically independent and centered Gaussian processes Bq, Bq on "H, where 

JE{B'Qig)B'Qih)) = Q{gh)~ j Ud,M^d,MR{dv) 

{Mci,,{gh) - MaM^dAh)) R{dv) 
]E{B'^{g)B'^{h)) = I Af,A9WdAh)R{dv)-Q{g)Q{h). 



By means of Anderson's (1955) Lemma or a further application of Chaining one can show 
that both Bq and Bq admit versions with uniformly continuous sample paths. 



Theorem 3.2 Suppose that the conditions of Theorem I3.il are satisfied. Further, let 
B'q ^, B'q 2, B'q 3, . . . be independent copies of B'q and independent from Bq. Then for 

'{e,h)eAxH 



any fixed integer L > 1, the process B^'^'"'' = {B^ (h)) ,^ ^. j^ converges in distribu- 



tion in £oo(A X H) to 

as iiiin(g, n) — )■ oo. 

Remark 3.3 (Understanding the decomposition Bq = B'q + B'q heuristically) Note that 
Bfe")(/i) = ^Jh{T^x) (P - P){dx). Thus 

^ f MdMM{h){P-P){dx) 



with N'd,q,\\x\\ '■= C{T^x). Here we utilize orthogonal invariance of C(T). Consequently, 
JE^^C?.") I p) is a standardized empirical process indexed by the special functions x i— )■ 

J^d,q,\\x\\ih),h e H, and 



E f IE (S ('?•") (^) I P) ]E(p(«'")(/i) I P) 

J^d,g,M\{gWd,q,Mih) P{dx) - j Md,,,\\x\\{g) P{dx) fAfd,g,M\{h) P{dx). 



Since A/d,g,|jx|| is close to Md^WxW^/q and £(||X|p/g) is close to R for large q, the latter 
covariance is close to 



MdMJ^dAh) R{dv) - Md,M R{dv) Md^h) R{dv) = W.{B'l^{g)B'l^{h)). 

Example 3.4 Suppose that d = \, and let "H consist of all indicator functions l(-oo,t]> 
t G M. Then Theorems 13. II and 13.21 are applicable whenever i?({0}) = 0. Writing M{t) 
instead of M(l(_oo,t]), the covariance functions of Bq, B'q and B'q are given by 

lE(PQ(s)PQ(t)) = Q(min{s,t})-Q(s)Q(t), 

W.{B'Q{s)B'Q{t)) = Qimm{s,t})-l^v-'/'sMv-'/h)R{dv), 

lE{B'^{s)B'^{t)) = U{v~''h)^{v-''H) R{dv) - Q{s)Q{t) 



for s,t G M, where Q{u) = /$(f ^^^m) R{dv), and $ denotes the standard Gaussian 
distribution function. 
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Remark 3.5 (Conservative inference) Under conditions (A 1-2) and (CI -3), pretending 
the empirical processes B}^'^' , 1 < £ < L, to be independent and identically distributed 
leads typically to conservative procedures. Precisely, let U be an open subset of iooiH). 
For instance let t/ = |6 G ^oo('H) : ||6||^ < k} for some constant k > 0. Then it follows 
from Theorem 13 .21 that 

liminf P(5^^'"^ G f/ for 1 < £ < L) > F{Bq e Uf. 

This may be verified as follows: By Theorem 13 .21 and the Portmanteau Theorem, the limes 
inferior on the left hand side is not smaller than 

W{B'Q^, + B'l^eUfovl<l<L) = lEP(5^_, + i?^Gf/forl<£<L|5a 



, L^ 



and by Jensen's inequality the latter expression is not smaller than 

{MW[B'Q + B'l^eU\B'l^)Y = nB'Q + B'l^eU)"^ = P(5q G f/)^ 

If (A. 1-2) is strengthened to (B) and IP(-Bq G dU) = 0, then the previous arguments 

lead to 

lim P [Bf''^^ G t/ for 1 < £ < L) 1 

min(q,n)->oo I _ TP/'R i^ TT\L 

lim W{Bf^-^ G Ffor 1 < £ < L) f " ^^^'^ ^ ^^ ' 

irLin((j',n)— >oo ) 

because Bq = almost surely. 

Remark 3.6 (The conditional point of view) Considering several projections of one data 
set means that we are interested in the conditional distribution of rv'/'^iV^ P — V^ P), 
given P. Indeed one may interpret Theorem |3.2| in the sense that for large q and n, 

£(i?(^'")|P) ^ C{B'q + B'1^\B'1^). 

In case of the stronger condition (B) in Corollary 12.21 Bq = 0, and 

Here are precise statements: 



Corollary 3.7 Suppose that the conditions of Theorem I3.il are satisified. Let F he any 
hounded and continuous functional on £00 (^) such that F{B^'^'"^) is measurable for all 
q > d and n > I. Then 

lE(F(i?('^'"))|P) ^c ^{F{B'q + B'^)\B'^) 

as min(g, n) — )■ 00. In case of a degenerate distribution R, 

as iiiin(g, n) — )■ 00. 

4 Proofs 

4.1 Hoeffding's (1952) trick 



In connection with randomization tests, 



HoeffdingI (|1952h observed that weak conver- 



gence of conditional distributions of test statistics is equivalent to the weak convergence 
of the unconditional distribution of suitable statistics in R^. His result can be extended 
straightforwardly as follows. 

Lemma 4.1 (Hoeffding) For k > I let Xk^Xk E X^ and Gk G Gk be independent 
random variables, where X^, X^ are identically distributed. Further let rrik be some mea- 
surable mapping from X^ x Gk into the separable metric space (M, p), and let Q be a 
fixed Borel probability measure on M. Then, as k -^ 00, the following two assertions are 
equivalent: 

(Dl) C{mk{Xk,Gk)\Gk) ^u,,p Q. 



(D2) C{mk{Xk,Gk),mk{Xk,Gk)) ^^ Q ® Q. 

Applications of this equivalence with non-Euclidean spaces M are presented by 



Romano 



(|1989h . We shall utilize Lemma HTTI in order to prove Theorem 12. 1[ 



Proof of Lemma 321 Define Y^ := mk{Xk,Gk) and Y^ := mk{Xk,Gk)- Suppose first 
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that (D2) ist true, i.e. C{Yk, Yk) -^yjQ®Q. Then for any / G C^ 



E((E(/(n)|G,)-Q(/))') 

= E(E(/(n) I G,f) - 2Q{f) EE(/(n) I Gu) + Q{ff 
= E E [f{Y,)f{Y,) I Gu) - 2Q{f) E E(/(y;) | G,) + Qiff 

= E(/(n,)/(n.))-2g(/)E/(n) + g(/f 

-^ j f{y)f{y)Q{dy)Qm-QUf 
= 0. 

Thus £(Ffe I Gk) ^w,p Q- 

On the other hand, suppose that (Dl) is satisfied, i.e. C(Yk \ Gk) -^w,p Q- Then for 
arbitrary f,g e Cb{M), 

JE{f{Yk)g{Yk)) = JEJE{f{Yk)g{Yk)\Gk) 

= E(E(/(n)|G'fe)E(/(n)|G',)) 

^ Q{f)Q{9), 



because ]E{h(Yk) \ Gk) -^p f hdQ and | E(/i(Yfc) | Gk)\ < \\h\\oo < oo for each h e 
Cb(M). Thus we know that E F(Yk, Yk) -^ f FdQ^Q for arbitrary functions F{y, y) = 
f{y)g{y) with f,gE ^^(M). But this is known to be equivalent to weak convergence of 
jC{Yk, Yk) to Q (g)Q; see van der Vaart and Wellner (1996, Chapter 1.4). 

Here is an alternative argument: With Qk '■= ^O^k I Gk), Assumption (Dl) is equiv- 
alent to DshiQk, Q) -^p 0. To prove that C{Yk, Yk) — > Q (X> Q, it suffices to show that 
E(F(Ffc, Yk) \Gk) ^p J FdQ^Q for any function F : M x M ^ [-1,1] such that 
\F{y^ y) ~ -^(-2, z) I < p{y, z) + p{y, z) for arbitrary y, y, z,z E M. But this entails that 
F{y, ■),F{-, y) E J^bl for arbitrary y,y eM. Consequently, 



]E{F{Yk,Yk)\Gk)- f FdQ^Q 



Fd{Qk®Qk-Q®Q) 



< j j F{-,y)d{Qk-Q) 

< 2DB^{Qk,Q). 



Qk{dy) 




Fiy,-)d{Qk-Q) 



Q{dy) 



u 
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4.2 Proofs for Section |2] 

That r = r^"?) is "uniformly" distributed on the set of column-wise orthonormal matrices 
in W^'^ means that C{UT) = C(T) for any fixed orthonormal matrix U G M^^*. For 
existence and uniqueness of the latter distribution we refer to Eaton (1989, Chapters 1-2). 
For the present purposes the following explicit construction of T described in Eaton (1989, 
Chapter 7) is sufficient. Let Z = Z^''^ := {Zi, Z2, . . . , Zd) be a random matrix in M^""^ 
with independent, standard Gaussian column vectors Zj E W. Then 

r := z{z^z)-^/^ 

has the desired distribution, and 

(3) r = g-^/2^(/ + 0p(g-^/')) asg^oo. 

This equality can be viewed as an extension of Poincare's (1912) Lemma. 

Proof of Theorem HH Let T = T{Z) as above. Suppose that Z = Z^^K X = X^') and 
X = X^'^^ are independent with C{X) = C{X) = P, and let F, Y be two independent 
random vectors in W^ with distribution Q. According to Lemma HTTl condition (Al) is 
equivalent to 

Because of equation ([3]) this can be rephrased as 

Now we prove equivalence of (Al") and (A2) starting from the observation that 

where 

"" •= U^X-XJ, g-iXf /J ^ ^ • 
Suppose that condition (A2) holds. Then T.^'^^ converges in distribution to a random 
diagonal matrix 

12 



with independent random variables S , S having distribution R. Clearly this implies that 

with Q = IE AfdiO, S'^Id). Hence (Al") holds. 

On the other hand, suppose that (Al") holds. For any t = {tj ,tj)~^ E M^"^, the Fourier 
transform of C{{Y'-i^^ , f-i^^y) at t equals 

]Eexp(i(t7r(^)+t^r(^))) = E exp(-t^S('')t/2) = H^''\a{t)), 

where i stands for v^^, a(t) := (||ti||V2, 11^21172,^7^2)^ G M^ and 

H^i\a) := ]Eexp{-ai\\Xf/q-a2\\Xf/q-a3X^X/q) 

denotes the Laplace transform of C[[\\X\\^/q, ||Xp/g,X'^X/g)^) at a G M^. By as- 
sumption, the Fourier transform at t converges to 

]Eexp{itjY) lEexp(2tjF). 

Setting t2 = and varying ti shows that the Laplace transform of £(||X|p/g) converges 
pointwise on [0, oo) to a continuous function. Hence ||Xp/g converges in distribution to 
some random variable S'^ > 0, and Q = lEA/'(i .52. Therefore, if S^^ denotes an independent 
copy of S"^, we know that H^'^\a{t)) converges to 

]E exp{-ai{t)S^)'E exp{-a2{t)S^) = fE exp{-ai{t)S^ - a2it)S^ - a^it) ■ O). 

A problem at this point is that for dimension d = 1 the set {a(t) : t G M^"'} C M^ has 
empty interior. Thus we cannot apply the standard argument about weak convergence and 
convergence of Laplace transforms. However, letting t2 = ±ti with ||ti|p/2 = 1, one 
may conclude that 



= lim(i7(«)(l,l,2) + ij(«)(l,l,-2)-2ff(«)(l,0,0)2) 

q— S-oo ' 

= lim (H^''\l, 1, 2) + i7(«)(l, 1, -2) - 2 ]Eexp(-||Xf/g - ||X||Vg)) 

q— S-oo 

= 2 lim ]E('exp(-||Xf/g- ||X||Vg)(cosh(2X^X/g)-l)y 
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But for arbitrary small e > and large r > 0, 

]E('exp(-||X|| Vg - ||Xf/g) (cosh(2X^X/g) - l)^ 

> exp(-2r)(cosh(2e) - 1) P(||X||Vg < r, ||X||Vg < r, \X'^X/q\ > e) 

> exp(-2r)(cosh(2e) - 1) hp{\X^X/q\ > e) - 2P(||X||Vg > r)) 

> exp(-2r)(cosh(2e) - 1) (w{\X~^X/q\ > e) -2W{S^ > r) + o(l)V 

Hence 

limsup P(|X^X/g| >e) < 2^(3^ >r). 

Letting r — )■ oo shows that X^X/q — )-p 0. D 

Proof of equivalence of (A2) and (A3). Proving that (A3) implies (A2) is elementary. 
In order to show that (A2) implies (A3) note first that conditions (A2) for the distributions 
P^'^^ imply the same conditions for the symmetrized distributions 

Po = p!;'^^ := CiX-X) = ^((^^(Z, - Z,+,))^<,^J. 

Condition (A2) for these distributions reads as follows. 

q 

(4) c(j2iZk-Z,+k?cTl/l) ^^ Ro = R^R and 

fe=i 
g 

(5) / ^(^fc ~ ^9+fc)(^2g+fc ~ Zsq+k)0'k/Q ~^p 0- 

fc=l 

The factors {Zk—Zq^k)iZ2q+k—Z3q+k), 1 < ^ < g, in ([5]) are independent, identically and 
symmetrically distributed. By conditioning on any one of these factors one can deduce 
from ^ that maxi<fc<g al/q — )■ 0. But then 

J2(^liZk-Zq+k)Vq = 2\\af/q + 0p{l + \\af/q), 
fc=i 

and one can deduce from (HI) that ||cr|p/g converges to some fixed number v; in particular, 
R = 5y. Now we return to the original distributions P. Here the second half of (A2) 
means that 

k 

^(/ifc + akZk){nk + crkZq+k)/q 



k=l 



l/^llV? + '^IJ'kCrkiZk + Zq+k)/q + ^ alZkZg+k/q 



Op{l) 



k=l k=l 
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Since 

k=l ^ k=\ 

^ k=\ ^ k=\ 

it follows that ||/i|| V? -^0. D 

4.3 Proofs for Section |3] 

Since Theorem l3.1 l is just Theorem l3.2l with L = 1, it suffices to verify the latter. 

Proof of Theorem 13.21 It suffices to verify the following two claims: 

(Fl) As g — 7- oo and n — )■ oo, the finite-dimensional marginal distributions of the process 
B^'^'"'' converge to the corresponding finite-dimensional distributions of B. 

(F2) As g — )■ oo, n — )■ cxo and 5 | 0, 



max sup 

^eA g,heH:pQ(g,h)<S 



^P 0. 



The second condition, (F2), means that the processes B^'^''^' are asymptotically equi- 
continuous with respect to the pseudodistance 

PQ{{i,g),im,h)) := l{i ^ m} + pQ{g,h) 

on A X "H. 

In order to verify assertions (Fl-2) we consider the conditional distribution of B^'''"'' 
given the random matrix 

r = r(^) := {ri,r2,..., Tl) e m^^^'^. 

In fact, if we define 

fe,H{v) := h{ve) for v = {vj , . . . , vj^ G M", 
then 



^(''•"^(/i) = n'/\T^P-T^P)ift 
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Thus C{B^'^'"'^ I r) is essentially the distribution of an empirical process based on n in- 
dependent random vectors with distribution F^P on M^'^ and indexed by the family 
H:={fe,h:ieA,heH}. 

The multivariate version of Lindeberg's Central Limit Theorem entails that for large q 
and n, the finite-dimensional marginal distributions of B^'''"'', conditional on T, can be ap- 
proximated by the corresponding finite-dimensional distributions of a centered Gaussian 
process on A x T-L with the same covariance function, namely, 

^('^\ii,g),im,h)) := Cov {B^''^-\g), Bl^Kh) \T) 

It follows from equality (|3]) and the proof of Theorem 12. II that 

r^P ^^,p Q ■= J^Ld,v R(,dv) as g ^ oo, 

an d this should imply convergenc e of S'^'^^ to some limiting function as well. It was shown 
by iBillingsley and Topsod (|l967|) that condition (C3) is equivalent to 

(6) lim sup Qiy eR''' : sup \h{z) - h{y)\ > e [ = for any e > 0. 

^4-0 hen '- z:\\z~y\\<S J 

Note that the rf-dimensional marginal distributions of Q are just Q. Therefore one can 
easily deduce from Q that for any fixed e > 0, 

lim sup Q|t;eM": sup \f'f"{w)-f'f"{v)\>e\ = 0. 

/')/"G'HU{l} w:\\w—v\\<5 -^ 

Hence a second application of IBillingsley and Topsoel (|l967|) shows that 



(7) 



sup |r^P(/7")-Q(/7")l ^ asg^oo, 

/',/"6HU{l} 



because F^P -^w,p Q- In particular, the conditional covariance function E*^"?) converges 
uniformly in probability to the covariance function S, where 

S((^,5'),("^, ^)) := Qife,gfm,h) - Qife,g)Qifm,h) 

f^LdAkgfm,h) R{dv) - Q{g)Q{h) 

KAah) R{dv) - Q{g)Q{h) if £ = m, 

^dAgWdAh) R{dv) - Q{g)Q{h) ifi^m, 
Cov [B'^/g) + B'l^{g),B'Q^^{h) + B'l^ih)) 
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as g — 7- oo. This proves assertion (Fl). 

As for assertion (F2), it is well-known from empirical process theory that condi- 
tions (CI -2) imply that for arbitrary fixed e > 0, 

(8) maxP[ sup B^/''"\g) - B'f'''\h) > e r] ^p 

as min(g, n) — )■ oo and 510. Here 



pf{g,h) := ^Jr^Piif^f, 
But it follows from © that 



TjP{{g-hr). 



max sup \pP{g,hf - pQ{g,hf\ -^p 
as g — )■ OO. Hence one may replace pf' in ([8]) with pg and obtain assertion (F2). 



D 



Proof of Corollary 13.71 The main trick is to replace conditional expectations with suit- 
able sample means. Note that conditional on P, the processes B\^ , B^ , B^ , . . . 
are independent copies of B^*.™) Likewise, conditional on Bq, the processes B'q^^ + 
Bq, Bq2 + Bq, Bq^^ + Bq, ... are independent copies of Bq + Bq. Hence 

L 



E 



1E(F(5('?'")) I P) - L-' J2 ^(^1''"^ 
JE{F{B'q + B'-) I B'l,) - L-' J2 ^^B'q, + B'l,) 



> < L-^^^\\F\ 



i=i 



for any integer L > 1. Consequently it suffices to show that for any fixed L > 1, the 
random variable L^^ ^^^-^ F{Bl^'^') converges in distribution to the random variable 
L~^ J2e=i -^(-^Q e + ^q) ^^ min(g, n) — )■ oo. But this is a consequence of Theorem [3!2] 
and the Continuous Mapping Theorem, because 

L 

defines a continuous mapping from £oo(A x "H) to R. D 
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